Functional Neuroanatomy of Dynamic Visuo-spatial Imagery
نویسنده
چکیده
In this chapter, a selective overview of the methods used in this thesis will be provided. This overview does not claim to be comprehensive, since only issues which are directly relevant for the methods employed in the empirical chapters will be discussed. In the following subchapter, the technique of functional magnetic resonance imaging (fMRI) will be presented after a brief description of the phenomenon of nuclear magnetic resonance and of magnetic resonance imaging. In chapter 2.2, an overview of electroencephalography will be given which focuses on event-related potentials and particularly on event-related slow cortical potentials (SCPs). Finally, after a summarizing evaluation of the strengths and weaknesses of these two methods, the advantages and problems associated with their combination in a multi-modality imaging approach will be discussed. 2.1 Structural and functional Magnetic Resonance Imaging (MRI/fMRI) Functional magnetic resonance imaging (fMRI) is a relatively new technique which entered research about 10 years ago (Ogawa et al., 1990ab, 1992; Kwong et al., 1992; Belliveau et al., 1991). It relies on the long-known phenomenon (James, 1890; Mosso, 1881) that increases in neural activity are accompanied by distinct changes in regional cerebral blood flow (rCBF). Since fMRI relies on the phenomenon of nuclear magnetic resonance (NMR), a selective overview of the basic principles of NMR will be given now. 2.1.1 Nuclear Magnetic Resonance/Magnetic Resonance Imaging (NMR/MRI) Soon after it became evident that the phenomenon of NMR might be used for clinical imaging, clinicians chose to name the associated technique magnetic resonance imaging (MRI) in order to avoid the negative connotations of the word "nuclear". Although NMR would be the more precise term since it implies that the magnetic properties of the atomic nucleus are assessed and utilized, I will adhere to this convention since the term MRI also describes more directly how the technique is used nowadays (namely to produce images of the human body and brain). However, it should be noted that in the following overview, the term MRI will refer exclusively to imaging using the magnetic properties of the atomic nucleus, and not those of electrons (as in Electron Spin Resonance, ESR). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 6 MRI takes advantage of the angular and the resulting magnetic momentum of atomic nuclei. The nucleus most often assessed in MRI is the hydrogen nucleus. Having angular momentum means that nuclei rotate around their own axis. In their normal state, the orientation of nuclei is randomly distributed, and no magnetic momentum can be observed macroscopically. When placed in a static magnetic field, nuclei align with this external field and precess around its axis (just as a spinning wheel not only rotates around its own axis but also exhibits precession). The spin can align either in parallel to or opposite to the orientation of the static magnetic field. Spins which are aligned in opposite direction have a higher energy level than spins aligned in parallel. The resulting energy difference can be observed macroscopically as a magnetic momentum or net magnetization in the direction of the static field, which is called B0. Spins precess around B0 at an angular frequency called the Larmor or resonance frequency, which depends linearly on the strength of the external field (the higher the field, the higher the Larmor frequency). If a radio frequency (RF) pulse (an excitation pulse) which is perpendicular to B0 and whose frequency is at or at least near to the Larmor frequency is applied, the spins will absorb energy and transitions between the two ways of alignment and the associated energy states are induced. After the excitation pulse is switched off, the spin system regains its initial equilibrium condition (i.e. magnetization returns to B0), and this process can be picked up by specifically designed MR receiver coils. The rebound to equilibrium is called relaxation. Relaxation time differs between biological tissues. Thus, it can be used to acquire images of different types of tissue. White matter, for example, shows much faster longitudinal relaxation than grey matter or cerebrospinal fluid (573 ms vs. 991 ms vs. 2063 ms, respectively; measured at 1.5 Tesla, Blüml et al., 1993). It is the sensitivity to different tissues that makes up the basic appeal of MRI, providing a tissue contrast which is far superior to X-ray based brain imaging techniques as computerized tomography. Two different types of relaxation exist (see Fig. 2-1). Longitudinal relaxation is characterized by the T1 relaxation time. This time is defined as the time required by the system to reach 63 % (1-1/e) of its equilibrium value after a 90° excitation pulse (i.e. a pulse which flips the direction of the magnetization vector from the longitudinal into the transverse plane) has been applied. T1 is also called spinlattice relaxation since energy is exchanged between the spins and their environment, which is called "the lattice" in dense matter physics. Transverse relaxation is characterized by T2 relaxation time (the time in which the signal decays to 1/e of FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 7 the original signal strength). This describes the loss of phase-coherence of the spins, which causes a loss of net magnetization in the x-y plane. This loss of phase-coherence mainly results from an exchange of energy between the spins due to their magnetic interactions, causing continuous changes in precession frequencies. Thus, T2 is also called spin-spin relaxation time. Additional dephasing of the spins is introduced by local inhomogeneities in the external magnetic field, with larger inhomogeneities causing more rapid dephasing and loss of signal. The combination of "instrument-induced" and "true" T2 dephasing results is assessed via T2* relaxation time. T1 and T2 (T2*) relaxation are occurring simultaneously, but at different rates (in biological tissues, 1/T2 is much higher than 1/T1). Thus, different pulse sequences can be used to produce data which are differentially weighted by the two relaxation parameters. Most functional imaging sequences are designed to obtain T2*-weighted images in order to take advantage of the phenomenon that blood with a lower concentration of deoxy-hemoglobin causes less inhomogeneities than blood with a higher concentration (see below). Up to now, it would have been more appropriate to use the term NMR instead of MRI, since we would not be able to obtain a structured image with the principles presented so far. Imaging not only requires a static magnetic field and a RF transmitter and receiver coil, but also the application of so-called gradients (see Fig. 2-2 and Fig. 2-3). Gradients consist of coils through which an electric Fig. 2-1: T1 and T2 relaxation curves for two different tissue types (dashed and solid lines). While T1 is defined as the time required to achieve 1-1/e % of the original longitudinal magnetization, T2 is defined as the time in which the signal decreases to 1/e % of its original amplitude (reproduced from Aine, 1995, Fig. 4). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 8 current is passed. This results in a magnetic field, which is added to the main static magnetic field B0. Consequently, the effective field strength at a given position within the magnet bore is a function of the main magnetic field and of the imaging gradients switched on and off for imaging. Thus, gradients can be used to define cubes (voxels) of distinct magnetic field strength in the investigated sample. Since a difference in magnetic field strength is associated with a difference in the Larmor frequency of the spins within this voxel, signals of different frequencies can be assigned to these voxels. Three different gradients are required to encode the three spatial dimensions. They are referred to as the slice selection (z), frequency encoding (x), and phase encoding (y) gradients (see Fig. 2-3). Usually, the first step in imaging is the selection of a slice of the whole volume (hence the name magnetic resonance tomography, since the ancient Greek word "tomos" means slice, or section). This is achieved via application of a gradient along the z-axis. As a result, nuclei will show different precession frequencies as a function of their position along the longitudinal axis of the magnet. Since this also implies that the Larmor frequencies vary, RF pulses can now be applied to selectively excite different slices of the biological probe (the brain, in our case). These RF pulses only contain frequencies near or at the Larmor frequency of the slice which shall be excited. Fig. 2-2: A standard clinical MRI tomograph consists of a magnet which is usually superconductive and produces the main magnetic field (B0), several gradient coils which produce local variations of the main magnetic field, and a high frequency coil (in this case a bird-cage coil for head scanning) for transmission and reception of high frequency radio waves (adapted from Kischka et al., 1997, Fig. 13.10). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 9 After slice selection, the selected slice has to be additionally subdivided to assess the signal within the slice. This is achieved via a combination of phase and frequency encoding. Phase encoding is achieved via a gradient perpendicular to the slice-selection gradient, which usually is the y-direction. This gradient causes differences in the velocity of precession, which results in a differce in phase between spins along the y-axis. Phase encoding is achieved in several phase encoding steps, which in echoplanar imaging (EPI; see below) consist of rapidly repeated blips. With frequency encoding, a gradient is applied in the direction perpendicular to the phase encoding and slice selection direction (usually the x-axis). Since this gradient is switched on during the read-out period (the period when the MR signal is acquired), it is also referred to as the readout gradient, and it produces a distribution of frequencies along the x-axis. Using two-dimensional Fourier Fig. 2-3: Spin vector orientations related to the application of slice, phase and frequency encoding gradients (reproduced from Cohen, 1996, Fig. 10). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 10 transformation of the acquired signals, images can be reconstructed whose spatial resolution depends on the resolution of the three gradients and the bandwidth of the slice selection RF pulse. Fig. 2-4 shows a typical pulse sequence used for structural brain imaging. Summing up, from a layman's perspective, the basic principle of MRI could be simplified as follows: Hydrogen nuclei are placed in an external magnetic field, which "forces" them to align with this field. Radio waves are then used to disturb this alignment for a very short period of time. When the radio waves are switched off, the nuclei return to their initial state. This process is picked up by a special receiver, with the induced voltage in the receiver coil being proportional to the magnetic properties of the tissue containing the nuclei. By using three gradients, the three-dimensional position of the origin of the signals can be determined, and this information can be used to produce an image of the scanned volume. For a physically exact account of MR imaging, the reader is referred to, e.g., Morris, 1987. 2.1.2 Functional Magnetic Resonance Imaging (fMRI) 2.1.2.1 Blood oxygen level dependent (BOLD) contrast fMRI is based on the principles of NMR plus the phenomenon that increases in neural activity are accompanied by local increases in blood flow and blood oxygenation. As such, fMRI is a truly non-invasive method since it uses the intrinsic contrast agent of blood oxygenation to assess neural activity in the brain. However, the first attempts to use MRI as a functional imaging technique in humans used an extrinsic contrast agent. Belliveau et al. (1991) intravenously inFig. 2-4: Spin echo (SE) pulse sequence. Note the timing of the three different gradients. While the slice selection gradient is switched on during RF excitation, the readout or frequency encoding gradient is on during signal acquisition (which is, in the case of a SE sequence, during echo acquisition). The shown excitation-inversion-echo sequence has to be repeated from 128 to 256 times in order to achieve phase encoding (see the phase encoding gradient; reproduced from Aine, 1995, Fig. 10). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 11 jected Gd(DTPA) (gadolinium diethylenetriaminepentacetic acid) and observed an increase in blood volume in the primary visual cortex during photic stimulation. Although this was one of the first demonstrations of the applicability of MRI as a tool for functional imaging, it did not have many advantages over the then dominantly used invasive imaging technique of positron emission tomography (PET), since Gd(DTPA)-MRI is invasive as well (with DTPA even being a toxic substance). But only one year before Belliveau & colleagues published their results, it had been demonstrated in animal experiments that the intrinsic properties of blood can be used to noninvasively map activity-related hemodynamic changes (Ogawa et al., 1990a). One year later, the usefulness of so-called blood oxygen level dependent contrast (BOLDcontrast) to map human brain activity was presented independently by Ogawa & colleagues and by Kwong & colleagues (1992; ironically in the same issue of the Proceedings of the US National Academy of Science after both groups' papers had been rejected by Nature and Science with the argument that nothing new is presented; Raichle, 2000). The initial observation that Ogawa and his collaborators made was that varying the percentage of oxygen in the air inhaled by rats changed the brightness of MR images of their brain. When the rat was ventilated with less oxygen, its brain appeared much darker than when the air was highly saturated with oxygen (see Fig. 2-5). This is due to the following phenomenon: Blood consists of liquid plasma and erythrocytes, leukocytes and thrombocytes. Erythrocytes or red blood cells are responsible for the supply of oxygen to metabolically active cells in the human body, and thus also in the brain. This is achieved through the binding of oxygen to hemoglobin, with hemoglobin being composed of globin and heme (the latter gives Fig. 2-5: Coronal images of the brain of a rat during inhalation of air with differing oxygen concentration (Ogawa et al. 1990a). Inhalation of air with a higher percentage of oxygen (upper image) resulted in the disappearance of the dark lines in the lower image (adapted from Raichle, 2000, Fig. 18). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 12 blood its red color). The center of heme consists of an iron atom (Fe) which is capable of binding O2. When an oxygen atom is attached to the iron atom, heme changes its magnetic state from paramagnetic to diamagnetic. The opposite applies when the oxygen atom is taken up by the local environment, that is when hemoglobin is deoxygenated. While de-oxygenated erythrocytes show positive magnetic susceptibility, the surrounding environment is diamagnetic and has negative magnetic susceptibility. This difference in magnetic susceptibility creates a local magnetic field gradient and, consequently, "inhomogeneities" in the magnetic field. Therefore, deoxygenated blood acts as a paramagnetic "contrast agent" which dephases the spins, resulting in faster signal loss and a decreased T2* time. Now let us consider what happens when neurons become active, e.g. by increasing their firing rate. The oxygen demand of these neurons increases, and the oxygen supply in the environment decreases. This triggers via several physiological mechanisms (see, e.g., Villringer, 1999) an increase in arterial blood (containing oxygenated erythrocytes) delivery near the activated area in order to avoid a shortage of oxygen supply. While regional blood flow and blood volume increase considerably (up to ~50%), blood oxygen extraction increases only slightly (as demonstrated by PET studies; Fox et al., 1988; see also Jueptner & Weiler, 1995), resulting in a "paradoxically" higher concentration of oxygenated blood in the activated area (paradoxical because it had previously been expected that the ratio of oxygenated and de-oxigenated blood should become lower due to the increase in oxygen consumption). As a consequence, the blood flowing through the vessels now has a similar magnetic susceptibility as the surrounding tissue, which reduces the strength of the local field gradient in and around the vessels (see Fig. 2-6). This causes less dephasing of the spins and thus a slower drop in signal, which is reflected in a brighter image in a T2*-weighted sequence optimized to detect such changes in local magnetic field inhomogeneities. Thus, most sequences used in fMRI are T2*-weighted and are often referred to as BOLD-weighted sequences. The change in hemodynamic response, which leads to a detectable change in magnetic susceptibility, occurs at a much slower rate than the changes in neuronal activity. This usually results in a several seconds delay in signal strength increase. This delay might show considerable variability between brain regions (see, e.g., Buckner et al., 1998), which seems to result from differences in capillary, neuronal and synaptic density of these regions. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 13 2.1.2.2 Imaging sequences used for BOLD-fMRI: echoplanar imaging (EPI) Although blood flow changes at a much lower rate than neuronal activity, the tracking of such changes requires fast imaging sequences. The imaging sequence most often or meanwhile almost exclusively used in fMRI is called echo planar imaging (EPI). EPI allows the acquisition of an image in far below one second. Although the theoretical basis of EPI was already conceived in 1977 by Mansfield, the technique had to wait until the nineties when improved hardand software allowed its application. Cohen (1999, p. 137) impressively demonstrates the temporal advantage of EPI over conventional imaging sequences: "While MRI, as practiced conventionally, builds up the data for an image from a series of discrete signal samples, EPI Fig. 2-6: Schematic description of the phemomena underlying BOLD-contrast. (a) Without active neurons, paramagnetic deoxygenated hemoglobin (blue dots) gives rise to local magnetic field gradients (indicated by the cone-like structure). (b) Neural activity leads to an increase in blood flow and a higher amount of diamagnetic oxygenated hemoglobin (red dots). This causes an increased homogeneity of the local magnetic field, and, therefore, slower signal decay/longer T2* times (reproduced from Windischberger, 1998, Fig. 31). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 14 is a method to form a complete image from a single data sample, or a single "shot" (...) For example, a typical T2-weighted imaging series (...) requires that the time between excitation pulses, known as TR, be two to three times longer than (...) T1. The T1 of biological sample is typically on the order of a second or so (...); TR must therefore be 3 sec or more. A more or less typical MR image is formed from 128 repeated samples, so that the imaging time for our canonical T2-weighted scan is about 384 s, or more than 6.5 min. By comparison, the EPI approach collects all of the image data, for an image of the same resolution, in 40-150 ms (depending on hardware and contrast considerations). This reflects a nearly 10,000fold speed gain". This fundamental difference between conventional and EP imaging becomes obvious when Fig. 2-4 and Fig. 2-7 are compared. Both sequences are spin-echo sequences (which means that the dephasing signal is rephased using a 180° RF pulse following a 90° excitation pulse, evoking an echo when all spins are in phase again). However, while conventional imaging requires several RF excitations for each phase encoding step, EPI uses short blips in the phase encoding direction to acquire all data with one single shot. This allows the acquisition of multi-slice images of the whole-brain depending on the in-plane spatial resolution and the slice thickness in about 1 to 2 seconds. Fig. 2-7: Example for an echo-planar pulse sequence. In contrast to the spin-echo sequence depicted in Fig. 2-4, only a single excitation and inversion pulse has to be applied to acquire an image, since phase encoding is achieved via several short "blips" of the phase encoding gradient. This requires special hardware (ultrafast gradient switching), but results in a significant reduction of acquisition and repetition time (reproduced from Cohen, 1999, Fig. 13.4) FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 15 The application of EPI to functional imaging depends on several (mainly hardware) requirements. High performance gradients with rapid rise times, high peak amplitudes, high accuracy and low eddy currents are required (eddy currents are the main reason for so-called ghost artifacts, which can cause serious problems in EPI). However, the power of the gradients cannot be infinitely expanded since, as the rapid changes in magnetic field produced by high-performance gradients can induce currents in the human body, which might lead to sensory stimulation. Also, the windings of the gradient coils produce considerable noise due to the rapidly changing forces related to the switching of the currents sent through them. 2.1.2.3 Stimulation paradigms in fMRI: event-related and single-trial fMRI Data acquisition, data analysis and the mode of stimulus presentation are tightly related in fMRI. As a method which was originally inspired by PET and other similar methods rCBF measurement, stimulus presentation paradigms and data analysis approaches were vastly borrowed from PET in the "early ages" of fMRI. This meant that tasks were presented block-wise, i.e. that several tasks were successively presented, and that the signal acquired during interleaved "off"-blocks or "control"-blocks was subtracted from these "on"-blocks. Although this strategy has several disadvantages (unspecific activity related to the maintenance of attention and effort during a block is mixed with task-specific activity, correctly and incorrectly answered tasks are mixed in the analysis), one important reason for the preference of blocked designs was that even EPI repetition times were generally in the range of several seconds if multi-slice scanning was desired. This was due to limitations of the gradient hardware as well as of the computer hardware (network connections, disk speed etc.) available in the early nineties. Since such long repetition times would have resulted in too small an amount of data points collectable during single task executions, tasks were presented block-wise to obtain a stable and reliable measure of the hemodynamic changes related to task processing. These hardware limits have now been vastly overcome, and the time required to scan the whole-brain of a subject with multi-slice EPI-fMRI seems to be ever decreasing (in the experiments performed for this thesis, which were started in early 1999, 15 slices were acquired in 1.5 sec; meanwhile, 20 and more slices with even slightly better spatial resolution can be acquired in about the same time with the same scanner due to a faster gradient system and more efficient computFUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 16 ing technology). Thus, it has become more and more common to use a single-trial acquisition scheme. In single-trial fMRI, stimuli are presented just as in ERP experiments, i.e. trial per trial. Since each trial has its own individual reference or baseline (which is the signal sampled immediately before task presentation), the single-trial acquisition mode minimizes the problem of blocked designs to separate stimulusor taskfrom state-related activity. In blocked-designs, a separate or interleaved block with a control task is required. This "control" or "off" block either has to be subtracted or modeled as a covariate from the active "on" block. In order to be sure that differences between blocks are exclusively related to stimulusor, in the case of cognitive tasks, cognition-related differences, it has to be assumed (apart from the assumption of “pure insertion” which might not always apply to cognitive paradigms; Sidtis et al., 1999) that state-related activity is constant in the control and task block, and that the baseline stability of the MR scanner is sufficient. Although these assumptions, of course, also hold for single-trial fMRI, the short time interval between “control” (baseline) and task should reduce violations of these assumptions. Instead of fixed inter-stimulus intervals (which might evoke expectancy-related activities in the pre-stimulus baseline), variable interstimulus intervals, or even a subject-paced stimulus presentation mode (subjects decide individually when the next item shall be presented), are possible. The main advantage of the latter is that subjects are allowed to make short breaks between tasks and to call an item only when they feel ready to process it. This is again in contrast to block designs, in which longer breaks between task items are not possible due to the requirement of constant cognitive and neural activity during the whole block epoch. Single-trial acquisition also allows for event-related averaging of the single-trial responses in order to increase the signal-to-noise ratio of measurements. Such averages can either contain all single-trials, or can be sorted postexperimentally to specifically asses certain task-types (e.g., difficult and easy ones) or task-responses (e.g., correct or incorrect ones). One of the best examples of this strategy is the use of single-trial fMRI data in memory research. In a typical memory experiment, subjects would be presented with items (e.g., words, faces, ...) that had or had not been presented in a previous part of the experiment. Many ERP experiments have revealed that brain responses to, e.g., correctly memorized items (hits) vs. correctly rejected items (correct rejections) are considerably different. Since it is not known before an experiment which items will be answered correctly or not, block designs hardly allowed investigations of such itemor category-specific activity. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 17 Another central advantage of single-trial fMRI is that data acquisition is stimulusand response-locked. This, in principle, allows a tracking of the interaction between brain regions which are involved in the solving of a complex cognitive task (see Humberstone et al., 1997). For example, brain activity related to the perception and perceptual encoding of a visual stimulus can be separated from activity related to the cognitive operations executed upon the resulting visual representation, and from activity associated with response preparation and response execution. However, due to the presumably rather low temporal resolution or the "sluggishness" of the hemodynamic response, a separation of rapidly changing cognitive processes will still be rather difficult and require sophisticated stimulus presentation protocols and analysis methods (see, e.g., Menon et al., 1998; Kim et al., 1997). Data acquired in single-trial mode can also be used to perform timeresolved analyses (Richter et al., 1997ab, 2000), which means that the width and the onset of the signal changes in different brain areas are correlated with the processing time of the corresponding task trial. Depending on which aspect of the task a brain region correlates with, it might be inferred whether activity in this region is task-specific or only related to some peripheral, unspecific or constant aspects of task solving, such as response execution. Summing up, stimulus presentation and data acquisition in single-trial mode has a number of advantages compared to the classical blocked stimulation protocols. However, in order to obtain sufficient data quality, several requirements have to be met (see, e.g., Ugurbil et al., 1999; Thulborn et al., 1999). Chief amongst them is a magnetic field a of sufficient strength to obtain a signal amplitude which is high enough to be detected during single task executions. This is the reason why almost all single-trial studies were performed at a magnetic field strength of 3 Tesla or higher (the most frequently used clinical MR scanners usually "only" provide field strengths of 1.5 Tesla). In addition, high-performance (ultra-fast switching) gradients, excellent homogeneity of the static magnetic field, and highperformance RF coils are mandatory. 2.1.2.4 Analysis of fMRI data An enormous amount of data is acquired and stored in a typical fMRI experiment. An EPI-image slice usually consists of 64 x 64 pixels, and if a TR of 1 sec and a trial duration of, say, 10 sec is assumed, 40960 pixel values have to be stored for only one trial and for only one slice. If it is additionally assumed that a FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 18 typical experiment requires about 40 task repetitions, and that 15 or even more slices are acquired, more than 24 millions of individual data values have to be analyzed in order to reveal the task-related brain activity of only one subject. Although fMRI is an imageoriented technique, it seems evident that image analysis cannot be based on a visual inspection of the spatiotemporal characteristics of these data a strategy which frequently represents the first analysis step in ERP research. Thus, in fMRI, data have to be massively reduced. Several image processing strategies (see, e.g., Lange et al., 1999, or Lange, 1996, for comprehensive and extensive overviews) exist to extract the relevant information from the vast amounts of data (in fMRI, relevant usually means relevant with respect to functional neuroanatomy, neurological or neuropsychological aspects). These strategies can be subdivided into paradigm-based and paradigm-free methods. While paradigm-based analysis approaches rely on testing whether and how well data can be fitted with a pre-specified model (e.g., whether a certain time-course is or is not contained in certain pixels), paradigm-free methods do not test existing models or hypotheses, but "look" at the data in an explorative manner. Fuzzy Cluster Analysis (FCA) is one interesting example of the latter type of methods. However, since mainly results obtained with paradigm-based methods will be presented in this thesis, the reader interested in FCA is referred to a number of papers published by the NMR group here in Vienna which specialized on FCA of fMRI data (see, e.g., Moser et al., 1997, 1999; Baumgartner et al., 1997). One of the first paradigm-based analysis concepts for fMRI was introduced by Bandettini et al. (1993). They suggested that a high correlation of the observed signal time course with a pre-defined reference function should indicate taskrelated signal changes. As a model for the true hemodynamic response, a boxcar Fig. 2-8: Time course of signal intensity in two different regions of the brain during the solving of a cognitive task (see chapter 6 for details). Task onset and task completion are indicated by red and blue arrows. Note the "ramp-like" time-course of signal intensity. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 19 or trapezoidal reference function was proposed which depending on the TR might also be shifted in time to account for the above-mentioned delay in hemodynamic response. It has been demonstrated (Lange, 1996; Klose et al., 1999) that this approach is equivalent to computing t-tests of signals acquired during active ("on") vs. inactive ("off") phases of a trial, an approach which was chosen in this thesis. The correlation analysis approach is easy to implement, does not require high-performance computing devices, and is quite straightforward. Its main assumption concerns the shape of the hemodynamic response. Although a boxcar or trapezoidal reference function might be a rough approximation of this hemodynamic response, Fig. 2-8 shows that the actual signal time-course does not always resemble a boxcar. Another problem with this approach is that a threshold has to be set in order to define when a certain correlation coefficient is believed to indicate significant physiological responses as opposed to random signal fluctuations. This, however, is one of the major "data reduction" tasks all fMRI analysis methods have to overcome. Although correlation analysis is still used by many groups, it is being replaced more and more by analyses using the statistical parametric mapping software (SPM). The corresponding MATLAB-coded software is freely available for research purposes (http://www.fil.ion.ucl.ac.uk/spm). SPM was and is continuously developed by the methodology group of the Wellcome Department of Cognitive Neurology/London/UK under the supervision of Karl Friston. Common to all analyses performed with SPM is the use of the General Linear Model (GLM) to assess the variability of data in terms of experimental effects, confounding effects, and residual error. SPM was originally developed for the analysis of rCBF data collected using PET or SPECT, but soon became incorporated by the fMRI community, which also led to the development of an fMRI-specific SPM-module (version SPM99). It is even planned to make SPM a general-purpose analysis tool with which all kinds of brain imaging (including EEG and MEG) data can be analyzed (Stefan Kiebel, personal communication). Since SPM represents a central analysis concept in fMRI research and was also used in the group-analysis of the study 2 This is also indicated by a survey of the fMRI papers in the year 2000 issues of the journals NeuroImage and Human Brain Mapping (HBM). In 25 % of the HBM fMRI papers, SPM was used. About 50% of the remaining reports used correlation analysis or t-tests. In the journal Neuroimage, a much higher percentage (63%) of SPM-based fMRI analyses was observed. Again, about 50% of the remaining reports used correlation analysis or t-tests. 3 Indeed, some EEG studies already exist in which SPM-like concepts were used in data analysis (Yoshino et al., 2000). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 20 presented in chapter 7, I will briefly describe the steps involved in the "SPMing" of fMRI data. 2.1.2.4.1 Analysis of fMRI data: Statistical Parametric Mapping (SPM) The analysis of fMRI data with SPM requires both patience and time. A number of consecutive and computationally demanding analysis steps have to be accomplished to finally obtain indices of activity at the voxeland cluster-level (see Fig. 2-9). Following several preprocessing steps (image reconstruction, motion correction, correction of slice timing), data are transformed into a common stereotactic space. This is achieved via normalization of the single-subject data to a template provided by the Montreal Neurological Institute (MNI; see http://www.bic.mni.mcgill.ca). The main reason for this rather time-consuming and far from trouble-free procedure is to allow for multi-subject comparisons, as well as to obtain a standard framework for reporting results using the three stereotactically defined image coordinates of active voxels (which, however, do not exactly match the coordinates reported by Talairach & Tournoux, 1988; see Brett, 1999). In SPM99, normalization is fully automatic and is achieved via several affine transformations (rotation, translation, shearing, zooming) which match the original image to the template brain using least-squares optimization. Following normalization, images are spatially smoothed by convolving them with a Gaussian kernel. This is done for several reasons. One of them is to improve the signal-to-noise ratio. According to the matched filter theorem, smoothing data with a filter that matches the signal will increase the signal-to-noise ratio of this signal. In the case of spatial smoothing, however, this would not only require advance knowledge of the size of the active regions/image clusters, but also a similar to identical size of active clusters in different brain regions, which usually does not apply. As a consequence, either several analyses are performed using smoothing with different kernel sizes (which does not only increase computation time, but also the probability of false positives), or a kernel size of about 2-3 times of the voxel size seems to be chosen according to some "tacit" convenience. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 21 Another reason to perform smoothing (and to chose different levels of smoothing) is to account for interindividual differences in both structural and functional neuroanatomy. Depending on the hypothesis, smoothing kernels of up to 20mm full-width-at-half maximum (FWHM), i.e. about 7 times the most common spatial resolution of EPI-fMR images, can be chosen for group analyses (C. Büchel, personal communication). Last, but not least, spatial smoothing with a Gaussian kernel is applied in order to make it more likely that data are distributed according to a normal distribution which is a central requirement of all kinds of GLM analyses. Following smoothing, the GLM is used to investigate whether several experimental variables and confounds can be used to predict or model the actually measured data. For example, if a boxcar function determined by the stimulus onset and offset times is used as a predictor, it can be investigated whether and where in the brain it contributes to a better prediction of the data. If the predictor of a certain voxel explains zero of the variance of the signal fluctuations in that voxel, this basically means that there is no task-related signal change in this area. In principle, this is an extended version of using the t-test or correlation analysis apFig. 2-9: Overview of the processing steps that have to be accomplished when using SPM for data analysis. Following normalization to a template, data are spatially smoothed with a Gaussian kernel. The normalized and smoothed data are then modeled, and statistical maps of parameter estimates are calculated. Statistical significance of these maps can be assessed via the theory of random Gaussian fields (reproduced from http://www.fil.ion.ucl.ac.uk/spm/course/notes00/Ch1slides00.pdf). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 22 proach presented above (since both can be formulated as a special case of the GLM). However, there are two (optional) important differences in SPM. One is that SPM allows additional modeling of the data with confounds, reducing the residual error in the linear regression. The other, more important difference is that SPM allows the use of different basis or reference functions. These functions can be used to model the time-course of the data with more sophisticated models than represented by a boxcar reference or a ttest (a t-test basically is a box-car reference with value zero in the "off" and value 1 in the "on"-period; see above). The most frequently used basis function is the canonical hemodynamic response function (hrf; see Fig. 2-10). Other approaches, such as the hrf and its temporal derivative, or a fourier set, or a three-gamma-functions basis function, can be chosen to model other time-courses in the data and to yield a more accurate fit of the model to the data. The more time-courses which are potentially contained in the data are modeled, the more accurate the model will be. However, this comes at the cost of losing degrees of freedom for statistical inference, and basis functions such as the fourier function can not be interpreted as straightforwardly as the canonical hrf basis function. After the explanatory variables (predictors and, if any, confounding effects) and the basis function(s) have been specified, they are entered into the design matrix of the GLM. Each column of the design matrix corresponds to an experimental effect or an effect that is considered to confound the data (including lowpass and high pass-filters). Parameter estimation is then performed using leastsquares fitting, resulting in statistical parametric maps based on these parameters (hence the name SPM). These SPMs are thresholded in order to decide whether any of the explanatory variables affects the data in a systematic way. The threshold is defined via critical tor F-values derived from the tor F-distribution. Fig. 2-10: Canonical hemodynamic response function and its temporal derivative used by SPM99 for the modeling of the time-course of fMRI data (figure extracted from the SPM99software). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 23 These critical values have a certain probability of being observed given that the null hypothesis is true, which might for example be that the presentation of a visual stimulus has no effect upon blood oxygenation in primary visual cortex. It is here where the problem of multiple comparisons, which is one of the most difficult challenges in the statistical analysis of functional brain images, has to be considered. The core problem is that the probability of obtaining a false positive, i.e. of erroneously rejecting the null hypothesis, exponentially increases with the number of statistical comparisons. As already stated, a typical fMRI image slice is made out of 64 x 64 pixels. If we assume that about 2/3 of these pixels cover the brain, and if we assume a significance threshold of α ≤ .05 (which implies that we will on the average commit a so-called type I error in 5 % of all comparisons), 136.5 of the statistical comparisons in this slice will be significant by chance alone. Put another way, the probability of not obtaining any pixel that survives the chosen threshold is 0.05, even if H0 holds for all pixels! Thus, even if no task-related brain activity exists, we will nevertheless observe some pixels surviving the chosen threshold in our images. Several solutions have been conceived to escape this situation and to maintain the family-wise probability of committing type I errors at the originally chosen α-level. One of them is Bonferroni correction, which means that the significance threshold is corrected for the number of comparisons m according to the formula α(corrected)=1-(1-α), with α being the uncorrected level of significance (Bortz, 1993). For example, if we had three tests, the corrected significance threshold would be α(corrected)≈0.017, so the probability of obtaining a type I error in at least one of these tests would be P=1-(1-0,017)≈.05. The basic assumption of Bonferroni correction is that the multiple comparisons are independent of each other. If this is not the case, conservative tests result, which means that the probability of type II errors (erroneously maintaining the null hypothesis although the alternative hypothesis is true) is increased. In brain imaging, where neighboring data values show higher correlation than more distant ones (a fact which is, by the way, amplified by the spatial smoothing which "smears" activity from pixels to their neighbors), multiple tests may show substantial dependence of each other. Thus, another means of p-value correction has been proposed that relies on the theory of random Gaussian fields (see, e.g., Worsley, 1996ab; Friston et al., 1995b). Admitting that this might be an overly simplified statement, the basic approach is to estimate the smoothness or "spatial correlation" in the images. From this value, it is derived how many independent observations can be statistically assessed, and the statistical inference is corrected for this number to maintain the FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 24 probability of committing type I errors for all comparisons at the originally chosen α-level. In case there is no spatial correlation, i.e. data values are 100% independent, corrected p-values are identical to Bonferroni-corrected ones. In case all observations are entirely dependent, the statistical comparisons will also be entirely dependent, and one single uncorrected test will be sufficient. However, the most common and more complicated case is that data are partially correlated. In this case without going into too much detail corrected p-values are calculated as follows: First, the number of resolution elements (resels) of an image is determined, with one resel being a block of pixels of the same size as the FWHM of the smoothness of the image (for instance, with a matrix size of 64 x 64 and a smoothing kernel of 3 x 3 pixels, 64/3 * 64/3, approximately 455 resels are obtained for only one single slice). The number of resels is similar, but not identical to the number of independent observations. Using the number of resels in an image, it is possible to calculate the most likely value of the so-called Euler characteristic (EC). The Euler characteristic is a topological measure based on the number of "peaks" and "holes" in an image (in our case, a statistical map, e.g. a t-map). The more peaks or "blobs" are contained in an SPM (i.e. the more values survived the chosen uncorrected threshold), the higher the EC will be. Taking the chosen uncorrected threshold and the number of resels, the EC is a good approximation of the probability of obtaining blobs surviving this threshold if the image is supposed to conform to a random field, and can thus be used to determine the corrected pvalue. After all these steps have been accomplished, a map is computed which contains only pixels which survived the chosen threshold, and which thus are considered to reflect a significant or non-random response. Such a map can be calculated separately for the different explanatory variables by defining so-called tor F-contrasts. For example, we might test whether activity during stimulus presentation is higher than during the interstimulus-intervals by appropriately weighting the parameters of these predictors in the linear regression equation and by setting the remaining parameters (confounds etc.) to zero. We might also want to know which 4 Generally, it also seems questionable whether a correction for multiple comparisons is required by all kinds of analyses, especially when inference at the voxel-level is sought. In the latter case, all brain voxels are assessed to test the null hypothesis whether there is no activity in the brain related to the experimental condition(s). Whenever any of these voxels exceeds the pre-defined threshold, it is taken as evidence for the alternative hypothesis. However, such an unspecific hypothesis should only be tested in purely exploratory studies. Usually, the question whether there is activity in the brain is already well-established by former studies. Instead, specific assumptions about the involvement of (a network of) brain regions exist, which is a hypothesis completely different to the one for which the multiple comparisons are corrected for (see also the discussion in chapter 6). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 25 brain areas are more active in a certain task than in another and vice versa (e.g., movements of the left vs. the right finger). The resulting p-value maps and their stereotactic coordinates are usually reported in the literature as the result of a massive data reduction (sometimes from millions of pixels to only one well-defined "blob" of activity). Inferences can be drawn on three levels, which are the voxel-, cluster, and set-level (Friston et al., 1995a). If inferences at the voxel level are sought, it is tested which voxels have an intensity equal to or higher than a chosen threshold h, which gives a test with generally high regional specificity but low sensitivity. Inference at cluster level implies assessing the probability of observing a cluster c of size k or more, defined by a threshold u (since in fMRI, cluster level inference is more powerful when a lower threshold is chosen, this threshold is usually lower than the one used for the voxel level inference; see Friston et al., 1995a). This gives a test with generally higher sensitivity, but lower specificity since protection of the risk of committing an error at the voxel level is not given. Finally, inference can be drawn at the set level, which boils down to the probability of obtaining a certain number of clusters or a cluster-set in a given image (more specifically, of obtaining c or more clusters with k or more voxels, above a threshold u). This results in a test which has high specificity but low localizing power/regional specificity. Although Friston et al. (1995a, p. 223) "(...) envisaged that set-level inferences will find a role in making statistical inferences about distributed activations, particularly in fMRI", they are hardly encountered in the results sections of fMRI or PET research reports. As this very brief and far from comprehensive summary has shown, a number of time-consuming and computationally expensive analysis steps have to be accomplished in order to obtain statistical parametric maps. Also, several assumptions and approximations concerning the distribution of the data have to be made with SPM and its GLM-approach. These assumptions might not always hold, especially when the sample size is low, which is often the case (see also Holmes et al., 1996; Vitouch & Glück, 1997). However, SPM is a research tool widely accepted by the functional neuroimaging community (an acceptance which might partially be related to its free-ware status and the need for a standardized analysis framework), and it is being constantly improved by one of the presumably bestfunded and manpower-equipped labs in the world. However, it must be noted that SPM represents only one way amongst many others (see, again, Lange et al., 1999, for an overview) of drawing statistical inferences with fMRI data, and as almost all data analysis approaches for the still very young method of fMRI, it is FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 26 work in progress. It should also be noted that the described general approach (scanning the whole search volume for significance) only applies if no definite apriori hypothesis exists. If, for instance, we had the specific hypothesis that parietal cortex is more active during mental rotation than during reading, we might test this by specifically assessing activity in this brain region using a region-of-interest approach (see also Worsley et al., 1996b). 2.2 Electroencephalography (EEG), Event-Related Potentials (ERPs), and Slow Cortical Potentials (SCPs) Compared to fMRI, electroencephalography (EEG) is a relatively old technique which was applied in humans for the first time in 1928 by Hans Berger. Some time ago, it appeared that EEG and the associated technique of ERPs would be replaced more and more by the tomographic techniques and even by magnetoencephalography (MEG; Crease, 1991; Wikswo et al., 1993). However, in recent years, a kind of re-launch or re-appreciation of the EEG technique seems to be taking place. This might have been triggered by the insight that the temporal resolution of EEG makes it a technique which can keep pace with the speed of human information processing and the associated fast changes in neural activity. In addition, the main disadvantage of EEG, namely its comparably low spatial resolution, is made up for more and more by technical and methodological advances (such as an ever increasing number of recording channels, improved and more sophisticated source localization and surface mapping techniques, etc.). Although there are several ways to analyze EEG data (such as spectral analysis, coherence analysis, event-related de-/synchronization), the most common approach is the computation of event-related potentials (ERPs). When topographically recorded, ERPs can be used to map changes in neural activity related to an internal or external event with millisecond resolution (however, see also chapter 2.3). Depending on their latency, morphology and supposed functional significance, ERPs can be subdivided into several components or component classes. For our purposes, the most relevant distinction is between transient phasic responses with an early latency, and sustained tonic responses with rather late onset latencies. These "late components" have been given several labels, such as slow potentials, slow waves, slow potential shifts, slow potential changes, slow brain potentials, steady potentials, DCor DC-like potentials, DC shifts and many others. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 27 Unfortunately, a clear terminology is still not at hand. Interestingly, many researchers that did not yet have the possibility of working with these "late components" connote with this kind of research that the absolute level of DC-activity (which would be called the steady potential) is measured. This was an approach frequently used in the "early ages" of slow potential research, although some rather rare approaches still exist in which the absolute level of DC-activity is taken as an indicator of brain activity, or, rather, brain activation (e.g., Schmitt et al., 2000; Trimmel et al., 2000). However, most studies of today (including our own) use the technique of event-related data acquisition and averaging to achieve a sufficient signal-to-noise ratio (with the signal in this case being the slow cortical activity). When eventrelated averaging is used, the term event-related slow cortical potentials seems to describe most precisely which kind of data was collected and how its was analyzed. Thus, I will also adhere to this terminology, but will use for the sake of brevity the abbreviation SCP or SCPs. 2.2.1 Event-related Slow Cortical Potentials (SCPs) SCPs are rather difficult to define on a purely conceptual level. However, they can be characterized based on the morphology of their time-course (which might be described as "ramp-like"; see also Rösler et al., 1997), their onset latency and the EEG recording equipment. The first description of SCP-like components was tightly coupled with two paradigms. These paradigms were designed to elicit either a Bereitschaftspotential (BP; Kornhuber & Deecke, 1964), or a Contingent Negative Variation (CNV; Walter, 1964). In the classical BP-experiment, where subjects have to perform voluntary movements, a movement-preceding increase in negativity could be observed which was thought to reflect neuronal preparation and/or programming of the movement execution. A similar ramp-like increase in negativity of considerable amplitude can be observed in the interstimulus interval of a CNV-paradigm, where subjects have to respond to an imperative stimulus after previous presentation of a warning stimulus. Thus, both paradigms investigated activity during the anticipation of or the preparation for an event. This led to the hypothesis that such negative variations of the EEG amplitude might be related to the mobilization of resources. This mobilization was thought to be achieved via a lowering of synaptic thresholds and a resulting increase in cortical excitability that would be consumed in the reaction part FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 28 of the paradigm, being reflected in a positive signal deflection following the response (see, e.g., Birbaumer et al., 1990; Rockstroh et al., 1989, Elbert, 1993). However, ramp-like negative amplitude increases have meanwhile been observed in a variety of experiments which did not only require simple response preparation or target detection, but the processing of more or less complex cognitive and sensory tasks (for recent overviews, see Bauer, 1998; Bauer et al., 1998; Rösler et al., 1997). To list only a few newer examples, SCPs were observed during visuo-spatial imagery (e.g., Lamm et al., 1999, 2001; Bajric et al., 1999), numerical reasoning and associated negative emotions (Fretska et al., 1999), affective speech processing (Pihan et al., 1997, 2000), memorization of verbal and spatial material (Rolke et al., 2000), piano playing (Vitouch et al., 1998), and acoustic perception during different states of consciousness (Fitzgerald et al., 2001). Common to all these experiments is that the assessed cognitive or sensory event required prolonged information processing, starting from about 2 sec (Fitzgerald et al., 2001) up to about 25 seconds in the piano-playing study. Thus, one might conclude that one requisite for SCPs to show up on the scalp surface is prolonged information processing. Fig. 2-11 and Fig. 2-12 exemplify such SCPs recorded in two different experiments. Fig. 2-11 shows SCPs from an experiment in which subjects were presented with simple tones of 2 sec duration. Following an initial transient response with a latency of around 150 msec and a fronto-central peak, a sustained negative potential can be observed with a similar topography, which dissolves within about 200 msec after the stimulus is turned off. The topography and the time-course of this potential make it likely that it reflects ongoing neural activity in auditory cortex related to the prolonged sensory input. Fig. 2-12 shows results from a more "cognitive" experiment. Subjects had to perform a task requiring dynamic visuo-spatial imagery (see Lamm et al., 1999). Again, following initial phasic potentials which now also include a P300-like component, negative-going potential changes can be observed which increase in amplitude during the whole interval of task processing and show a maximum over the occipito-parietal scalp. 5 This, however, does not exclude that changes in steady potentials (SP) are an omnipresent phenomenon which is also present during shorter stimulus processing times. For instance, it has been shown that the amplitude and latency of P300 depends on the stimulus-preceding SP-level, or that CNV amplitude is diminished depending on the CNV-preceding negativity (see, e.g., Bauer et al., 1993; Gaillard & Näätänen, 1980). However, since it is rather difficult to separate the phasic components from the underlying slow potentials, and since the amplitude of the latter makes up only a small fraction of the former, such SPs are usually ignored or cannot be unambiguously identified in the recordings (especially when an amplifier with a time-constant is used). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 29 Two additional important aspects become evident in a thorough inspection of these two figures. One is that SCPs have a much lower amplitude and a much lower rate of amplitude change at the scalp surface than the "classical" evoked responses (that is also why the adjective "slow" seems to be quite appropriate, although I think it was originally chosen to reflect the slow response in terms of onset latency). For example, despite 5 sec of neural activity and a constant increase in negativity, electrode Pz does not reach an amplitude as high as the one the early phasic components have achieved within several milliseconds. This sometimes makes it quite difficult to separate the slower response from the early transient response, especially if the aim is to determine a discrete onset latency of SCPs (see also the discussion in chapter 2.3 and chapter 6). The second, more pleasant aspect is that SCPs show processing-specific topographies. While the auditory evoked SCP shows a bilateral fronto-centrally dominated scalp distribution, visuo-spatial imagery evoked a parieto-occipital scalp maximum. Since such task-specific topographies have been observed in virtually all studies using SCPs, one might conclude that topographically recorded SCPs can be used as a fairly accurate indicator of cortical activity related to prolonged cognitive, sensory or motor processing (with, however, the same restrictions in the ability to localize as classical ERPs, see Footnote 5). While the early or transient responses, which most ERP research still focuses on, might be useful for the investigation of the effects of stimulus evaluation and of orientation towards a task, SCPs are useful for the investigation of the process of task-solving per se. This makes multi-channel SCP recordings especially attractive for the psychologist or cognitive (neuro-)scientist aiming to gain non-invasive access to the neural bases of human cognition. 2.2.1.1 Measurement of Event-Related Slow Cortical Potentials Recording SCPs requires a technology similar to the one used in the recording of "regular" EEG or of short-latency ERPs. However, additional provisions have to be met to avoid slow drifts or other artifact-related changes of the measurement baseline, since such artifacts might easily obscure the low-amplitude SCPs. In the Brain Research Laboratory (Department of Psychology, University of 6 Köhler et al. (1955) were the first to explain this rather contra-intuitive scalp distribution as an effect of the gyrification and cytoarchitecture of the human auditory cortex. Since the human auditory cortex lies at the upper surface of the temporal lobe, its neurons are oriented radial towards the fronto-central and not towards the temporal scalp region. This results in activity that is picked up in the fronto-central electrodes, rather than in the temporal ones. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 30 Vienna, Austria) which has a long tradition in recording SCPs and steady potential changes, the following standards have proven to provide recordings of excellent quality. The first indispensable requisite is a DC-amplifier (i.e. an amplifier with a theoretically infinite time-constant) with high-input impedance (> 10 GΩ, based on theoretical considerations; Bauer et al., 1989; Bauer, 1998) and excellent baseline stability (< 5μV/day). Commercial DC-amplifiers are now provided by Neurosan Inc. (Synamps, Neuroscan), albeit with a much lower input impedance (10 MΩ). A high input impedance is mandatory since it allows to keep the currents flowing through the recordings electrodes as low as possible, minimizing the danger of slow polarization in tissue below the recording electrode. Second, skin-scratching (Picton & Hillyard, 1974) is recommended to equalize inter-electrode impedance at a value ≤ 1 kΩ, and to minimize skin potential artifacts. Skin-scratching also allows the electrode-skin interface to be kept stable for a longer period of time compared to other skin-preparation techniques, such as abrasion of the upper layers of the epidermis. Third, non-polarizable electrodes (e.g., Ag/AgCl) have to be used to Fig. 2-11: ERPs and SCPs during prolonged acoustic stimulation (presentation of a 800 Hz tone for 2 seconds). Approximately 150 ms after tone presentation (ON), a phasic negative potential is evoked (arrowhead). After ~500 ms, a sustained negative potential develops which persists until the stimulus is switched off (OFF; indicated by two arrows). The left-hand side of the figure shows the SCP and CSD topography of this SCP at a latency of 2 seconds post stimulus onset (adapted from Bauer, 2001, Fig. 1). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 31 avoid polarization at the surface of the electrode, which might result in slow drift artifacts or signal attenuation. Fourth, in order to obtain mechanically independent recordings, and to avoid artifacts resulting from movements of electrodes relative to the skin, electrodes are mounted on electrode sockets glued to the scalp using collodion. An additional provision is to use degassed electrode gel, especially if long-term recordings (> 1h) are performed. Otherwise, macroscopically invisible bubbles contained in the gel might migrate to the electrode surface due to thermal effects and alter the electrode potential, resulting in slow drift artifacts. According to our experience, it is also mandatory to fill electrodes with the gel at least 1/2 hour before their application to allow for the stabilization of the electrode potential. Although this whole procedure results in some extra-time for application (~ 2 1/4 h per subject in this thesis, in which 49 EEG, EOG and reference electrodes have been used), the additional effort is justified by a higher quality of recordings and by a reduced amount of trials which have to be excluded due to artifacts. 2.2.1.2 Analysis of SCPs: visualization and SCP mapping The main challenge in EEG and hence also in SCP research is to identify the cortical generators of the surface-recorded activity pattern. Up to about 15 Fig. 2-12: ERPs and SCPs during processing of a dynamic visuo-spatial imagery task. As in Fig. 2-11, phasic ERPs of negative and positive polarity are followed by a slow increase in negativity. On the left side, the topography of activity at a latency of 5 seconds after task presentation is shown. In contrast to Fig. 2-11, this topography shows a maximum over the posterior regions of the scalp, reflecting extended neural processing in the occipital and the parietal cortex (adapted from Lamm et al., 1999, Fig. 2). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 32 years ago, ERP and SCP analyses were mainly based on a visual inspection and statistical analysis of wave-form or time-course differences (e.g. to investigate whether neurological or psychiatric patients show increases in the onset latency of certain ERP components compared to normal controls). Questions about the cortical generators of these differences could hardly be asked, since in most studies, only few recording channels were used. In recent years, triggered by faster computing hardware and visualization software and the availability of multichannel amplifiers, several approaches have been developed to relate topographical activity to the underlying cortical anatomy. The most common of these is the computation of SCP maps. Since the actual surface distribution can be sampled only partially, map computation requires some kind of interpolation algorithm to estimate activity between the electrodes. The main aim of EEG mapping is to provide a more comprehensive and unbiased summary of the spatio-temporal pattern of activity. In addition, the usually rather blurred potential maps can be substantially "sharpened" by calculating scalp Laplacian or scalp current source density (CSD) maps (Fig. 2-13 gives an example of how considerable this effect can be). CSD maps reveal where volume current emerges from the cortex and is passed to the skull and the scalp; thus, they shall provide a more precise estimate of the epicortical surface potential distribution (Nunez, 1989; Nunez et al., 1994; Fig. 2-13: Maps calculated using SCP amplitude and its CSD transform. While the topography based on the SCP amplitude is rather blurred and has low "spatial frequency", the CSD topography depicts several sinks and sources which for instance allow a differentiation of activities over the posterior scalp into parietal and occipital activity. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 33 Srinivasan et al., 1996; Babiloni et al., 1996). An additional advantage is that they are reference free, and that they attenuate low spatial frequencies ('smearing') introduced into the scalp potential distribution due to volume conduction. However, this comes with the disadvantage of a reduced sensitivity to deeper sources. Independent of whether potential maps or their CSD-transforms are computed, interpolation can either be based on a spherical head model (sphericalspline interpolation; e.g., Perrin et al., 1987, 1989), or a realistic head model (e.g., Babiloni et al., 1996; Gevins et al., 1991). While the spherical approach is easier to implement and requires considerably less computation time, the realistic head model yields more accurate maps (see also chapter 4). When a realistic head model is used, information about the individual head geometry of each subject has to be acquired. Up to now, this information was either gained from structural MRIs (Babiloni et al., 1996, 1997; Gevins et al., 1991), or from a three-dimensional digitization of the head surface (Huppertz et al., 1998; see also chapter 3). Either method requires the measurement of individual electrode coordinates. As chapter 4 will show, this represents yet another attempt to increase accuracy of SCP and EEG mapping, since interpolation errors resulting from the unrealistic assumption that electrode coordinates are constant between subjects are avoided. In addition to the rather simple and straightforward mapping of surface activity, several more sophisticated source localization algorithms have been developed to solve the so-called inverse problem. Since none of these source localization procedures was used in this thesis, I will keep their description very short. The presumably most well-known example is the localization of equivalent dipoles or dipole configurations that explain the surface distribution (see, e.g., Scherg & von Cramon, 1986; Scherg et al., 1993). Dipole solutions have often been criticized as being physiologically unrealistic, and they might indeed be of limited value in the analysis of electrophysiological activity during cognitive tasks which are usually recruiting large and widely distributed networks of brain areas. Also, most of the source localization algorithms require rather specific hypotheses in the form of anatomical constraints, which, unfortunately, do not always exist in the still more exploratory assessment of cognitive processing. However, a good example to overcome these limitations is the proposal of Scherg & Göbel (1998) to use ac7 This has been due to monetary reasons, since source localization software is rather expensive. In addition, as will be shown in detail in chapter 7, EEG data were not acquired simultaneously, but consecutively using two electrode sets. This would presumably affect the accuracy and reliability of source localizations. However, in a follow-up study to this thesis supported by the Jubiläumsfonds der Oesterreichischen Nationalbank, it is currently planned to test the accuracy and plausibility of different source localization algorithms in the analysis of SCP data. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 34 tivity clusters detected via fMRI as constraints for the dipole solutions. Several other source localization approaches, e.g. those pertinent to the general class of minimum-norm based approaches (see, e.g., Pascual-Marqui, 1999, for an overview), try to account for the fact that current sources are extended and mostly nonuniform. The presumably best-known of these algorithms is LORETA (low resolution brain electromagnetic tomography; Pascual-Marqui et al., 1994), whose current version localizes current sources by computing low-resolution ("blurred") tomographic images, with solution space being confined to cortical tissue defined by means of the stereotactic space of Talairach & Tournoux (1988). Yet another innovative approach was introduced with the so-called deblurring technique (see, e.g., Gevins et al., 1991, 1997). By explicitly considering the conductivity of the different compartments of the head (which are derived from MRIs), the surface distribution on the inner surface of the skull is computed in order to reduce the serious distortion of scalp potentials by the high resistance of the skull. Actually, almost all contemporary source localization algorithms now also incorporate the differences in conductivity of the head using either threeor four-shell head models or finite element models. Finally, it has to be kept in mind that the accuracy of surface mapping and of the source localization algorithms strongly depends on the amount of spatial sampling achieved (being a function of the number of recording channels and the size of the scalp surface of a subject). The higher the sampling, the more accurate the maps and the source localization will be. Using 128 up to 256 channels has been recommended based on the analysis of simulated and real data (see, e.g, Tucker et al., 1993; Srinivasan et al., 1998). However, the application of more electrodes comes at the expense of an increased electrode application time. This especially applies to SCP research where the usage of an electrode cap is not recommended, especially when longer processing epochs shall be investigated (see above). For example, the recent application of 64 SCP electrodes (see http://brl.psy.univie.ac.at/aktuelles/newamplifier.htm) resulted in an application time of ~ 3 h , and the average application time in the experiments performed for this thesis was ~ 2 1/4 h with "only" 42 EEG channels. Also, there are still a lot of studies published in which EEG was recorded from 19 electrodes only. Although there is no doubt that it should be aimed to avoid such low sampling in the future the validity of such studies should not be doubted hastily. It seems more appropriate to ask what kind of research questions want to be answered and, even more importantly can be answered with EEG studies. Although recent attempts to turn EEG into a "true" neuroimaging technique have to be acknowledged, it appears FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 35 questionable whether a spatial accuracy and validity comparable to tomographic techniques will ever be achieved or at what cost it may be achieved. This is not only related to the spatial sampling issue, but to the more general issue of the inverse problem which cannot be solved without applying numerous constraints and assumptions. Thus, it might be both more cost efficient and realistic to focus on the main advantage of ERPs, which is their temporal resolution, and to add spatial accuracy using other techniques (including whole-head MEG, where the collection of large-array data is much easier). When such a synergistic approach is used, it might be sufficient to achieve rough estimates of the cortical generators (with 1-2 centimeter spatial resolution). Another strategy might be to use "low-resolution" topographies to identify areas of interest in a first step. Depending on the results, a replication study with higher sampling can be performed. In fact, such an approach was partially applied in this thesis, where it was attempted to refine results of earlier studies in which 22 EEG channels had been used (see chapter 7). 2.2.1.3 Analysis of SCPs: statistical inference The standard approach of drawing statistical inferences in SCP research is identical to the one recommended for the analysis of ERPs. Although nonparametric approaches have been repeatedly proposed to increase the robustness of inferences (Wassermann et al., 1989; Srebro et al., 1996; Karniski et al., 1994), using parametric analyses is common practice in EEG studies. In most cases, similar to SPM, the general linear model is used to assess whether experimental conditions affect the data in a predictable way. In practice, this means that univariate (ANOVA) or multivariate (MANOVA) repeated measures analyses of variance are computed, and that their main effects and interactions are evaluated (Vasey & Thayer, 1987; O’Brien & Kaiser, 1985). A MANOVA can only be computed when the number of subjects exceeds the number of independent variables, which is rarely the case, especially in high-density studies. In case an ANOVA is calculated, the p-values for effects containing a repeated-measures factor commonly have to be corrected for violations of the so-called sphericity assumption (Vasey & Thayer, 1987). While significant main effects indicate differences in the mean amplitude calculated across all independent variables (usually the electrodes), a condition x electrode interaction indicates scalp topographies which are different between conditions or groups. It is this kind of statistical result EEG-experimenters are usually interested in, since this would signify that different brain regions are active in different conditions or groups. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 36 However, an influential paper by McCarthy & Wood (1985) doubted whether such an interpretation is justified at all. In a simulation study, they showed that a significant condition x electrode interaction might also be caused by a change in source strength without a concomitant change in source location or source orientation. It was therefore suggested that some kind of normalization procedure should be applied prior to all kinds of statistical analyses of topographical data. While this removes differences in the general amplitude level between conditions, the relative strength of activity across electrodes is preserved, and topographical differences due to genuine differences in underlying neural generators would not be affected. On the other hand, Haig et al. (1997; but see also Ruchkin et al., 1999) have argued that the conclusions drawn by McCarthy & Wood were based upon unrealistic assumptions on the nature of ERP data and their neural generators. They exemplify that the type of scaling McCarthy & Wood propose might obscure differences in source configuration and arrive at the conclusion that analyses of both raw and normalized data should be performed and reported in order to avoid such false negatives. This approach seems mandatory because the researcher usually is not only interested to know if source configurations differ between conditions, but also whether there is a change in strength of the same sources (e.g. if differences in task difficulty are investigated). In addition to testing omnibus effects via ANOVA/MANOVA, a-priori abd post-hoc tests are usually performed in order to assess significant differences at certain factor levels. A-priori hypotheses are commonly tested via linear contrasts. Corrections for violations of the sphericity assumption have to be performed for such contrasts, too. This can be accomplished by using contrast-specific error terms instead of the pooled variance terms (Boik, 1981; Keselman, 1998). It should also be noted that multiple post-hoc tests (such as the Scheffé or Tukey test) are not exact in case of violations of sphericity and should thus not be performed. Thus, post-hoc hypotheses should either be tested via multiple independent linear contrasts or, e.g., the test statistic proposed by Keselman (1982) which is roughly equivalent to computing multiple t-tests. In both cases, a correction for multiple comparisons (Bonferroni or other) is required. 2.2.1.4 Neural generators of SCPs The cellular mechanisms giving rise to scalp-recorded SCPs are not substantially different from those generating other kinds of EEG activity. Thus, SCPs are mainly generated by postsynaptic potentials (PSPs) in the apical dendrites of FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 37 cortical pyramidal cells (see Fig. 2-14). Such PSPs are triggered by unspecific and specific thalamo-cortical and intracortical axonal inputs (for an overview, see Birbaumer et al., 1990). Based on simultaneous recordings of intracortical and surface potentials, it seems well-established that excitatory PSPs (EPSPs) in the upper cortical layers cause negative SCPs on the surface, and that a reduction in excitatory input, and/or an increase in inhibitory PSP (IPSPs) yields positive SCPs. Theoretical considerations and model simulations (see, e.g., Lutzenberger et al., 1987) have shown that the amplitude of surface SCPs is almost exclusively generated by cell assemblies of several thousands of equally oriented cortical neurons with apical dendrites near the recording electrode, and that even very strong subcortical and deeper sources can only contribute to a very small extent to the surface potential. However, even superficial PSPs are only visible on the surface when a considerable number of neurons is active simultaneously (i.e., at least roughly in phase). Also, SCPs maxima are not always located exactly over the area of maximal activity. This results from the massive gyrification of the cerebral cortex, which causes an orientation of neurons (and thus of the resulting orientation of the electrical field) which is not always perpendicular to the curvature of the scalp surface. This is of special importance in studies of the primary motor and somato-sensory cortex (but also of the auditory cortex; see Fig. 2-11, which provides an example of such an effect), where the folding of the cortex along the central fissure might even lead to a surface distribution with an activity maximum ipsilateral and not contralateral to the moving or stimulated body part. But even if absolute localization of activity is difficult and requires precise knowledge of the geometry of the activated neural tissue, relative information proFig. 2-14: Model of the generation of surfacenegative potentials. Thalamocortical or intracortical afferents evoke EPSPs at the apical dendrites, resulting in an extracellular and intracellular flow of ions. The corresponding field potential is picked up by a surface electrode (reproduced from Birbaumer & Schmidt, 1999, Fig. 21-7). FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 38 vided by SCP topographies acquired in different conditions (e.g., movement of the left vs. the right arm) might help to identify the true generators since different topographies will reflect a difference in the underlying cortical generators in most cases. Unfortunately, the opposite does not always hold since the inability to reveal a topographical difference might also be due to an insufficient sampling of the surface distribution, or might result from the blurred transmission of potentials to the surface. This caveat has to be kept in mind especially if the investigated sources are not well-separated in space, e.g. in somatotopic mapping. Long-lasting slow PSPs (s-PSPs; Libet, 1971) and glial cells (Roitbak, 1983) might also play a role in the generation of SCPs. s-PSPs last for several seconds up to minutes and seem to regulate the excitability of neurons. However, according to Birbaumer et al. (1990), the exact electrogenic mechanisms underlying s-PSPs have not yet been established, and it is not yet evident to which type of slow potentials s-PSPs contribute. In my view, it seems that they are rather (co-) responsible for the tonic, long-lasting changes which can be measured via steady potentials and steady potential changes than for the generation of SCPs. As for glial cells, several arguments speak for an at least indirect involvement in the generation of SCPs (see Roitbak, 1983, and Laming et al., 1998, for extensive reviews). Glial cells are omnipresent in cortical tissue and occupy about 50 % of the cortical volume (Laming, 1998). They act as a local buffer of potassium (K) that is released to the intersticium by active neurons. When the amount of extra-cellular potassium is high, this triggers glial depolarization. It has been shown that the timecourse of this depolarization is very similar to scalp-recorded SCPs (Caspers et al., 1980; Caspers, 1993). This, however, need not be interpreted in the sense that these cells directly contribute to surface SCPs, since the field gradient evoked by glial depolarization drops steeply with distance. It seems instead to reflect the correlation of prolonged neural activity and the resulting Krelease/uptake by glial cells. On the other hand, the uptake of K might again, indirectly lead to an amplification of the surface potentials, since positively charged ions are removed from upper and redistributed to lower cortical layers, which "sharpens" the already existent intracellular electrical field that is observed at the surface. Independent of whether glial cells contribute directly or indirectly to scalp SCPs, this contribution should not be considered as an artifact or as evidence that SCPs do not genuinely assess neural activity since it is in fact the neural activity and the resulting K-release that trigger this additional SCP generator. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 39 2.3 Multi-modality neuroimaging Fig. 2-15 provides an overview of the temporal and spatial resolution achieved by the methods which are currently available for investigating brain function. This figure perfectly illustrates why Gazzaniga et al. (1998, p. 119) have stated in their Cognitive Neuroscience textbook that "often the convergence of results yielded by different methodologies offers the most complete theories. A single method cannot bring about a complete understanding of the complex processes of cognition that rely on numerous brain structures." This becomes clearly evident when the temporal and spatial resolution of scalp-recorded electrophysiological (EEG/MEG) and of blood-flow based imaging techniques (PET, SPECT, fMRI) are compared. While the former are generally believed to provide a high temporal and rather coarse spatial resolution, an excellent to good spatial and moderate to poor temporal resolution is assigned to the latter. A combination of these methods should thus provide detailed information not only about the "where", but also about the "when" of brain activity during senFig. 2-15: Comparison of the temporal and spatial resolution of most of the currently available methods of investigating brain function (reproduced from Gazzaniga et al., 1998, Fig. 3.40) FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 40 sory, motor or cognitive processing. However, Gazzaniga & colleagues' statement does not only apply to synergies with respect to the rather "technical" issue of temporal and spatial resolution. It is evident that the information provided by the techniques depicted in Fig. 2-15 can be vastly different. Different aspects of cognitive function and underlying brain activity are assessed by different methods. For example, the relatively new technique of TMS (Transcranial Magnetic Stimulation; see, e.g., Pascual-Leone et al., 1999) provokes localized and transient "lesions" in the brain by interrupting neural processing and/or causing neuronal discharges in the stimulated brain tissue. Delivered at different times and to different brain regions, TMS can be used to investigate the effects of such lesions on behavioral parameters as reaction time and/or the correctness of an answer. Whenever effects on these parameters are revealed, it might be concluded that the stimulated region is (at least indirectly) involved in the processing of the investigated task. This represents a completely different approach to brain function than the one pursued with fMRI or ERPs, or, generally, with neuroimaging devices. Here, neural activity evoked by information processing is observed. Thus, one always has to be aware of the correlative nature of neuroimaging data, since it might be in the worst case that the region that "lights up" is functionally irrelevant for the function which is investigated (see also Sarter et al., 1996). To make things even more complicated, the various neuroimaging techniques also provide access to different aspects of neural processing. While, e.g., ERPs directly measure changes in postsynaptic potentials (albeit only of larger, simultaneously active patches of cortical tissue), fMRI or PET assess changes in blood flow which are more or less directly related to changes in the metabolism of neural tissue. Such differences in the measurement substrates should, of course, be considered when comparing results between techniques (see also McCarthy, 1999; Nunez & Silberstein, 2000). Hence, I will briefly discuss some issues which are relevant for a comparison and combination of fMRI and ERPs (and especially SCPs) in the study of cognitive processing. This will include a) a discussion of the synergies which can be achieved with respect to the temporal and spatial resolution of the two methods, and with respect to the difference(s) in the measurement substrate, and b) a very brief discussion of practical and theoretical problems associated with multimodality imaging, such as the choice of the stimulation paradigm, the coregistration of data, and the interpretability and comparability of results. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 41 2.3.1 Combination of fMRI and SCPs: synergies 2.3.1.1 Temporal and spatial resolution Clearly, one of the main aims of a combination of fMRI and ERPs/SCPs is to increase the temporal and spatial resolution with which task-related brain activity can be assessed. As already noted, ERPs are usually recommended for their temporal resolution in the milliseconds range, whereas fMRI is praised for its spatial resolution in the millimeter range. On the other hand, ERPs are criticized for their coarse spatial resolution or ability to localize. Similarly, the temporal resolution of fMRI is thought to be rather poor, a statement which is mainly based on the sluggishness of the hemodynamic response. Thus, it is generally believed that a combination of the two methods will reveal the location and timing of cognitionrelated brain activity. However, some critique or refinement of these statements might be required. Regarding temporal resolution, it seems debatable whether milliseconds resolution of neural activity is truly provided by ERPs or SCPs, although it is certainly true that electrophysiological signals can be sampled by modern amplifiers in the milliseconds range. From a purely technical point of view, each method can resolve signals (both in space and time) at half its spatial and temporal sampling frequency (according to the Nyquist theorem; see, e.g., Glaser & Ruchkin, 1976). Thus, if we sample the EEG signal at, say, 250 Hz, signal changes as fast as 125 Hz can be monitored without bias. However, this does not imply that the signal one is interested in also changes at such a fast rate. This applies even more to SCPs, since they seem to reflect changes in event-related activity which occur at a much slower rate than the ones of conventional or phasic ERPs. If we inspect, e.g., Fig. 2-11, it becomes evident that the slow potential component of the averaged signal does not appear before about 400-500 ms post stimulus. Nevertheless, the argument discussed below regarding absolute and relative statements certainly applies. An absolute determination of activity onset might not be possible with a resolution higher than 200-300 ms. However, if we are interested in comparing activity between conditions and between regions, differences in onset timing might become visible in the 50-100 ms range. Similarly, it is only the technical spatial resolution of fMRI that is in the millimeter range. Some of this accuracy is lost during numerous preprocessing steps (such as motion correction and stereotactic normalization, which both require interpolation). In addition, whenever inter-subject averaging or some kind of comFUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 42 parison across subjects is required, the technically achieved spatial accuracy is sometimes considerably reduced by spatial smoothing (which is performed to account for interindividual differences in both structural and functional anatomy). Sometimes, smoothing filters as large as 20 mm FWHM are recommended for statistical group comparisons (see chapter 2.1). This definitely blurs the acquired data and does only allow approximate inferences about the involvement of smaller structures (such as, e.g., the thalamus or the amygdalae). On the other hand, spatial resolution of EEG and temporal resolution of fMRI might be better than widely believed. For example, dipole localization accuracy of 7-8 mm for EEG and 3 mm for MEG has been demonstrated using a human skull phantom (Leahy et al., 1998). Also, single-slice fMR images can be collected within about 50 ms with modern scanners. Although hemodynamic response is usually delayed for several seconds, this does not mean that the relative timing between regions cannot be assessed with a much higher resolution. This was recently demonstrated by Menon & colleagues (Kim et al., 1997; Menon et al., 1998; Menon & Kim, 1999). They demonstrated a separation of hemodynamic responses with a temporal resolution of 50 to 125 ms in averaged, and of 1-2 s in single-trial responses. However, such values could only be achieved when the signal time-course of different brain regions was compared. Within the same region, the upper limit of temporal resolution was approximately 5 s (Kim et al., 1997; however, the TR of this study was 0.87 s, and higher temporal accuracy might have been achieved with a faster repetition rate). The latter results point towards an important aspect when discussing the temporal and spatial resolution of neuroimaging methods. This aspect is the type of inference we are interested in: do we want to draw absolute inferences, or is a relative inference sufficient? Whenever a relative statement is required, the weaknesses of the two methods might be reduced. As for ERP and SCP studies, differences in topographies revealed with a low spatial sampling might be sufficient to infer that different neural generators are involved in two different sensory, motor or cognitive tasks. Even separation of the scalp topographies of very similar tasks and, accordingly, with very similar and close neural sources (as, e.g., in somatotopic mapping) might be achieved with sufficient reliability. However, the absolute determination of the generators of these topographies is usually rather difficult and cannot be achieved without additional, sometimes rather serious constraints which have to be imposed on the source localization algorithm. As for fMRI, the same seems to apply to its temporal resolution. While the sluggishness of the hemodynamic response makes it difficult to separate changes in neural activity within the FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 43 same brain region, conclusions about the temporal sequence of activity are still possible when onsets of signal changes across regions are compared (see, e.g., Buckner et al., 1998; Miezin et al., 2000). On the other hand, it should be noted that several studies have shown that the onset of the hemodynamic response can be quite different across different regions of the brain. This might require some caution in equating hemodynamic onset times with the onset of neural activity. Another aspect that has to be kept in mind is the temporal resolution that is required by a certain study. Certainly, the "temporal resolution" of most cognitive task paradigms is much lower than the one of sensory or motor tasks. In the latter, we might observe changes in neural activity within and across regions far below 100 milliseconds. For example, the preparation and execution of such a "simple" act as a finger movement involves a rapid sequence of activities in the brain which controls this finger movement. This starts with preparatory or programming activity in the premotor region, goes to executive activity in the primary motor cortex, and ends with activity in the somatosensory cortex reflecting the sensation associated with the movement. For both fMRI and EEG, it is especially challenging to separate the latter two processes: While their temporal separation would be difficult for fMRI, their spatial separation would be tricky for EEG due to the vicinity of somatosensory and primary motor cortex. A different picture arises when we are studying cognitive events, which are usually much more extended in time. For example, the comparison of two objects at angular difference as in the mental rotation paradigm introduced by Shepard & Metzler (1971) usually takes several seconds. Thus, during a longer period of time almost no change in the invoked cognitive functions occurs, and associated neural activity might therefore be resolved even at sampling rates of more than a second. A similar argument applies to the cube comparison task used in this thesis. An initial phase of stimulus evaluation and mental image generation is followed by an extended period (up to 60 seconds!) of cube rotation and cube comparison. In this period, there should not be too much variability in the involved cognitive processing. Therefore, the corresponding brain activity should also be rather constant, and the sampled brain signals should be rather smooth (which is, indeed, shown by several studies, including the ones performed in this thesis). On the other hand, it has to be considered that the solving of mental rotation tasks does not only require mental rotation. Cognitive processes as object identification, generation of a mental image, matching of the rotated with the reference object, and response execution are also involved. Since these processes require much less time, they are much more difficult to separate in time and might FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 44 not be depicted appropriately by the imaging techniques. Also, some of these processes seem to take place in adjacent and overlapping regions and are thus also difficult to separate in space. For instance, it has been shown that solving of mental rotation tasks is accompanied by activity in premotor regions. Parts of premotor and primary motor cortex are also involved in response preparation and execution, and in the control of goal-directed voluntary eye movements. Thus, despite a long tradition in the neuroimaging of mental rotation paradigms, it is still a matter of debate whether or not premotor activity is specifically related to the visuo-spatial operations required by the tasks (see also the discussion in chapter 6). In general, it should be noted that the technical temporal and spatial resolution of fMRI and ERPs can be substantially improved by choosing "tuned" measurement protocols. For example, if imaging is confined to a single slice only, much higher fMRI sampling rates can be achieved than when whole-head coverage is required (50 ms vs. ~ 1-2 sec). Also, a much better in-plane spatial resolution than the commonly reported 3 x 3 mm with EPI can be achieved when more time is invested to acquire an image. Similarly, the spatial resolution of EEG measurements can be improved when electrodes are densely placed over a certain brain region. Unfortunately, these strategies come at some expense, which is a confinement of results to the pre-experimentally selected brain volume or a reduction in temporal resolution. 8 One reason for this might be that cognitive events are not as time-locked as "simple" sensory or motor events. This sometimes makes it questionable to average across trials in order to increase the signal-to-noise ratio. Let us assume, for example, that a mental rotation task is solved by carrying out the above mentioned cognitive processes in a sequential manner (object identification mental image generation mental rotation of the mental image matching of the rotated image with the reference object.) Using neuroimaging, we should be able to identify the neural activities associated with these processes. However, their onset and duration is not constant. For instance, mental rotation might not be successful and will therefore be repeated (e.g. because an inappropriate direction of rotation was chosen by the subject). This will result in a jitter of the latency and duration of the cognitive processes. Averaging of such data, therefore, will only give us a rather crude summary of both the spatial and temporal aspects of the neural processing involved in task solving (see also Flexer, 1999.) However, as already discussed above, the advent of single-trial fMRI might provide a solution to this problem. For instance, analysis of the data acquired for this thesis with the exploratory data analysis technique of fuzzy clustering has revealed different signal time courses in parietal, premotor and primary motor cortex (Windischberger et al., 1999). While pixels in parietal and premotor cortex showed an early onset of signal increase persisting until task response, the completely different time-course of pixels in primary motor cortex contralateral to the response-executing hand suggested that this region was not active during task processing itself, but only immediately before response execution. Similarly, it has been shown (Richter et al., 1997, 2000) that the reaction time of a mental rotation task correlated with onset and width of parietal and premotor signal changes, while it correlated with only the onset in the contralateral primary motor cortex. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 45 The present chapter should not be misunderstood in the sense that a combination of fMRI and EEG is unnecessary because fMRI can have excellent temporal and because an excellent spatial resolution can be achieved via EEG. It should only become evident that the required temporal and spatial resolution always depends on the task paradigm and the scale of neural events one is interested in. Thus, imaging has to be specifically "tuned" to each new research question. This can either be based on one method alone, or on the combination of two or even more methods. Clearly, there is no method which is superior in all situations, and it should have become evident in this chapter that a more realistic evaluation of the respective weaknesses and strengths of the various techniques will allow for more progress in brain research than the simplistic praise of fMRI for its spatial and of EEG for its temporal resolution (see also Nunez & Silberstein, 2000). 2.3.1.2 What is measured neural activity vs. hemodynamic response fMRI and ERPs provide different kinds of information about brain activity. Commonly, this is summarized in the statement that fMRI measures only a correlate of neural activity (the hemodynamic response), while ERPs measure neural activity "directly." However, this summary is to simplistic if we want to thoroughly consider the differences and synergies between the two types of measurements. Apart from the fact that the generation of the hemodynamic responses measured by BOLD-fMRI is still not fully understood, ERPs do only measure certain aspects of the various types of "neural activity." In this chapter, I will try to compare in some more detail the types of neural activity assessed by the two methods and discuss the consequences for their comparability and potential con-/divergences. As discussed in chapter 2.2, only certain aspects of "neural activity" are reflected in ERPs. These are mainly changes in ionic concentration due to input to apical dendrites of pyramidal cells in the upper cortical layers. In order to produce surface-measurable field changes, neurons have to align in a certain way to form an open field. This is one reason why activity of stellate cells is not detectable through scalp measurements (see Nunez & Silberstein, 2000). For a similar reason, neuronal activity in subcortical structures rarely shows up at the surface, since neurons in these structures do have different orientations. The greater distance of subcortical structures or deeper cortical layers to the surface additionally attenuates their amplitude (Lutzenberger et al., 1987). ERPs also show different sensitivity depending on the geometrical orientation of active neural tissue, with FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 46 sensitivity being highest for neurons oriented radial to the scalp surface. This is particularly important as most of the cortical surface is considerably folded, with many neurons lying in sulci which might not be oriented exactly normal to the scalp surface. In addition, large pools of neurons have to be active synchronously to produce a measurable surface signal. When activity is not in phase, it might even cancel out and be invisible to the epicranial sensors. Thus, it has to be kept in mind that ERP measurements do not provide a homogeneous sampling of "neural activity", but only a selectively "weighted" image of neural computations. Unsynchronous and deeper activities remain mostly unseen, and the geometrical orientation and the resulting projection of activity to the scalp surface has to be considered in data interpretation. On the other hand, with a few rare exceptions, differences in the timing and the amplitude of surface potential changes between and within brain regions can be used in a quantitative way to assess the amount and onset of neural activity. In addition, there is no requirement that activity is temporally extended in order to be detectable, since ERPs will reflect even very short changes in ionic concentration. Also, increased surface negativity in relation to some baseline can again with some occasional exceptions be interpreted as an indicator of increased activity, while positive potentials indicate a reduction of activity and/or an increase in inhibitory activity. As for fMRI, it is more difficult to provide a definite account of the kind of neural events that lead to a BOLD-response since its neurophysiological and neurovascular mechanisms are yet to be understood comprehensively (see, e.g., Jueptner & Weiler, 1995, and Magistretti & Pellerin, 1999, for some recent models). However, even without such knowledge, it is at least possible to state under which empirical conditions BOLD-contrast responses do or do not occur. fMRI samples brain activity rather homogeneously. Thus, all regions of the brain can be imaged equally well, independent of the type and orientation of their neurons, of whether or not they form an open or closed field, and of the depth of the active structure. One exception to this rule are regions which are prone to susceptibility artifacts, e.g. the anterior parts of the temporal lobe and the orbitofrontal cortex. However, shimming the static magnetic field to these volumes of interest, tailored slice positioning, and multi-shot imaging can be utilized to reduce such artifacts. Since the BOLD-response is triggered by the metabolic demand of neurons, there is also no requirement that neurons are active in phase. On the other hand, it seems that signal amplitude is not as directly related to the amount of neural activity as in ERPs, since blood flow seems to increase in the sense of an all-or-nothing law. Although it has been shown that events as short as 30 ms produced a measFUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 47 urable hemodynamic response, one requirement for a BOLD-response with detectable amplitude might be that neuronal activity extends in time and possibly also in space. Also, due to the already discussed sluggishness of the hemodynamic response, sustained changes in activity are more easily detected than transient or rapid activity changes. Last, but not least, it has to be kept in mind that fMRI signal increases do not unambiguously indicate whether they are related to an increase or a decrease in neural activity, since both the activity of excitatory and inhibitory neurons lead to an increase in metabolic demand. The present discussion has shown that there a number of reasons for the combination of fMRI and ERPs in addition to the dominantly discussed issue of temporal and spatial accuracy exists. While fMRI provides a precise threedimensional localization of neural activity, complementing ERPs in the assessment of deeper or subcortical structures, ERPs help to determine whether fMRI signal increases are related to increased excitatory or inhibitory activity, and allow a quantification of signal increases. These differences in the measurement substrate, however, also imply that it should not always be expected that the two methods provide identical to converging results, or show one-to-one correspondence (see also McCarthy, 1999). For example, unsynchronous activity of neurons, or activity of stellate cells only, might result in an increase in blood flow while producing no measurable ERP change. Also, brief changes in cortical activity might go undetected by fMRI while producing a clear change in the surface potential. Nevertheless, although a different, rather provocatively formulated statement was recently presented (Nunez & Silberstein, 2000), such differences in brain maps should rather be an exception to the rule. This should be even more the case when investigating cognitive functions, which are mainly supported by neocortical structures, and which usually require prolonged and synchronous neural processing. 2.3.2 Challenges in the combination of neuroimaging methods As discussed in the previous chapter, the combination of fMRI and ERPs might indeed make sense and provide new and valuable insights into the neural bases of cognitive processing. However, this does not come without additional expenses and challenges. The main expense is, of course, the additional time and money which has to be invested in data acquisition and data analysis. Apart from this rather "secular" problem, one of the main challenges is the choice and set-up of an appropriate stimulation paradigm which evokes robust activity in both fMRI FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 2. METHODS 48 and ERPs. For instance, it seems inappropriate to use ERPs to investigate subcortical or cerebellar activities, or to use fMRI to image activity associated with very short-lasting and/or rapidly changing neural activities. Apart from these obvious restrictions, the task paradigm used should show no or only negligible practice-related effects, since changes in processing strategy or task proficiency would affect the comparability of results when measurements are performed consecutively (as is the case in most studies). Also, although combinations of blockeddesign fMRI and ERPs are still encountered, such comparisons have serious limitations. Hence, an event-locked presentation mode should also be implemented for the fMRI measurements. Another restriction which is rather unfamiliar to the EEG-researcher is that several subject selection criteria have to be taken into account. Subjects must not be claustrophobic, nor have any metallic implants, and they have to be able to lie still in the scanner for at least an hour. Another, rather logistic problem is the management of measurements. It is mandatory to balance the sequence of measurements across subjects. This should cancel out effects of learning, practice or task familiarity, and the potential change in motivation or familiarity with the investigation encountered from the first to the second measurement session (see also the questionnaire results in chapters 6 and 7). From an analysis point of view, accurate co-registration is required if the results are to be displayed in a joint coordinate system. This issue is discussed in detail in chapter 3. Finally, when co-registering and comparing the results for interpretation, the limitations and strengths of the two methods, and the task designs used for data acquisition, have to be considered in order to fully exhaust the complementary vs. convergent potential of the multi-modality data. 9 If fMR images and high-resolution ERPs could be acquired simultaneously, tasks need not have this quality. Although several reports demonstrated that EEG of sufficient quality can be recorded within the scanner (e.g., Ives et al., 1993; Goldman et al., 2000), it is evident that both the data quality of fMRI and EEG/ERPs cannot be as good as when data are recorded separately. This results from the considerable number of artifacts the two techniques mutually induce in their measurements. For example, pilot studies performed for this thesis have shown that even the electrode sockets and the electrode gel used for SCP recordings caused considerable artifacts in the MR images. FUNCTIONAL NEUROANATOMY OF DYNAMIC VISUO-SPATIAL IMAGERY CHAPTER 3. CO-REGISTRATION OF EEG AND MRI 49 3. Co-registration of EEG and MRI data using matching of spline interpolated and MRI-segmented reconstructions of the scalp surface
منابع مشابه
Individual differences in brain activity during visuo-spatial processing assessed by slow cortical potentials and LORETA.
Using slow-cortical potentials (SCPs), Vitouch et al. demonstrated that subjects with low ability to solve a complex visuo-spatial imagery task show higher activity in occipital, parietal and frontal cortex during task processing than subjects with high ability. This finding has been interpreted in the sense of the so-called "neural efficiency" hypothesis, which assumes that the central nervous...
متن کاملFunctional neuroanatomy of visuo-spatial working memory in Turner syndrome.
Turner syndrome (TS), a genetic disorder characterized by the absence of an X chromosome in females, has been associated with cognitive and visuo-spatial processing impairments. We utilized functional MRI (fMRI) to investigate the neural substrates that underlie observed deficits in executive functioning and visuo-spatial processing. Eleven females with TS and 14 typically developing females (a...
متن کاملConsistency of inter-trial activation using single-trial fMRI: assessment of regional differences.
Recently, the technique of single-trial fMRI was introduced, which allows the assessment of hemodynamic responses to single task executions (e.g. sensory, motor, or cognitive). In this study, single-trial fMRI was used to examine regional differences in the inter-trial consistency (ITC) of brain activity related to the processing of a dynamic visuo-spatial imagery task. For every single trial, ...
متن کاملNeuroanatomy of hemispatial neglect and its functional components: a study using voxel-based lesion-symptom mapping.
Spatial neglect is a perplexing neuropsychological syndrome, in which patients fail to detect (and/or respond to) stimuli located contralaterally to their (most often right) hemispheric lesion. Neglect is characterized by a wide heterogeneity, and a role for multiple components has been suggested, but the exact nature of the critical components remains unclear. Moreover, many different lesion s...
متن کاملVisuo-Spatial Imagery Impairment in Posterior Cortical Atrophy: A Cognitive and SPECT Study
This study investigated the cognitive profile and the cerebral perfusion pattern in a highly educated 70 year old gentleman with posterior cortical atrophy (PCA). Visuo-perceptual abilities, spatial memory, spatial representation and navigation, visuo-spatial mental imagery, semantic and episodic-autobiographical memory were assessed. Regional cerebral blood flow (rCBF) was imaged with SPECT. C...
متن کاملDeeply Semantic Inductive Spatio-Temporal Learning
We present an inductive spatio-temporal learning framework rooted in inductive logic programming. With an emphasis on visuo-spatial language, logic, and cognition, the framework supports learning with relational spatio-temporal features identifiable in a range of domains involving the processing and interpretation of dynamic visuo-spatial imagery. We present a prototypical system, and an exampl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001